Surface the self-approval prohibition at the top of verifier.json by pengfei-threemoonslab · Pull Request #148 · ThreeMoonsLab/agents-shipgate

pengfei-threemoonslab · 2026-05-30T06:57:02Z

What

Promotes the self-approval prohibition to the top of verifier.json. When a PR edits the rules that evaluate it — a weakened release policy or a touched trust root — a coding agent must never silently self-approve (reward hacking). #146 carried that message inside a fix_task instruction; this surfaces it in the two fields an agent reads first.

Why

The verifier already detects policy_weakened / trust_root_touched and routes them to a human, but the agent-facing headline and human_review.why still showed the generic scan headline. An agent skimming the top of the artifact wouldn't see the most important fact: you cannot clear your own gate.

Changes

_self_approval_note() — the explicit "a coding agent cannot self-approve that change — a human must review it" message. policy_weakened takes precedence over trust_root_touched; clean reviews get no note.
headline leads with the note when present (ahead of agent_summary.headline).
human_review.why leads with the note, and a note forces human_review.required = True regardless of the verdict path — defense in depth so a weakened policy can never be marked agent-clearable.
8 unit tests (tests/test_self_approval_signal.py).

Verification

Full suite 2346 passed, 4 skipped, 0 failed; generate_schemas.py --check clean (no schema change — additive logic over the existing capability_review flags); ruff clean.

🤖 Generated with Claude Code

When a PR weakens the release policy or touches a trust root, a coding agent must not silently self-approve a change to its own gate. That prohibition was only present inside a fix_task instruction (PR #146); promote it to the two fields an agent reads first. - Add _self_approval_note(): the explicit "a coding agent cannot self-approve that change - a human must review it" message for policy_weakened (taking precedence) and trust_root_touched. - verifier.json headline leads with the note when present. - human_review.why leads with the note, and a self-approval note forces human_review.required=True regardless of the verdict path. Full suite: 2346 passed, 4 skipped. No schema change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ew fix) Addresses review of #148: a self-approval note forced human_review.required=True, but can_merge_without_human and first_next_action still keyed only off merge_verdict, so the defensive (mergeable + note) path could emit "human review required" and "safe to merge" at once. - _can_merge_without_human returns False whenever a self-approval note exists. - _first_next_action routes to a human review (never the "safe to merge" action) when a self-approval note is present, including the fix_task-None defensive case. - Both thread capability_review from _build_verifier. Clean mergeable behavior (no note) is unchanged; covered by a regression test. Full suite: 2349 passed, 4 skipped. No schema change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

pengfei-threemoonslab and others added 2 commits May 29, 2026 23:56

pengfei-threemoonslab merged commit 48686d9 into main May 30, 2026
1 check passed

pengfei-threemoonslab deleted the feat/verifier-self-approve-signal branch May 30, 2026 22:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Surface the self-approval prohibition at the top of verifier.json#148

Surface the self-approval prohibition at the top of verifier.json#148
pengfei-threemoonslab merged 2 commits into
mainfrom
feat/verifier-self-approve-signal

pengfei-threemoonslab commented May 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pengfei-threemoonslab commented May 30, 2026

What

Why

Changes

Verification

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant